Attribute Relation Extraction from Template-inconsistent Semi-structured Text by Leveraging Site-level Knowledge

نویسندگان

Yang Liu

Fang Liu

Siwei Lai

Kang Liu

Guangyou Zhou

Jun Zhao

چکیده

A variety of methods have been proposed for attribute-value extraction from semistructured text with consistent templates (strict semi-text). However, when the templates in semi-structured text are inconsistent (weak semi-text), these methods will work poorly. To overcome the templateinconsistent problem, in this paper, we proposed a novel method to leverage sitelevel knowledge for attribute-value extraction. First, we use a graph-based random walk model to acquire site-level knowledge. Then we utilize such knowledge to identify weak semi-text in each page and extract attribute-value pairs. The experiments show that, comparing to the baseline method which does not utilize sitelevel knowledge, our method can improve the extraction performance significantly.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Agricultural Knowledge Discovery from Semi-Structured Text

This research aims to develop automatic knowledge discovery system from semi-structured Thai text for supporting plant diagnosis. Plant disease diagnosis is very important for farmers to be able to cure infected plants before infections become more severe. Prior to diagnosis, farmers need to gain knowledge retrieved primarily from text, including unstructured and semi-structured document. As th...

متن کامل

High-Precision Web Extraction Using Site Knowledge

In this paper, we study the problem of extracting structured records from semi-structured Web pages. Existing Web information extraction techniques like wrapper induction require a large amount of editorial effort for annotating pages. Other schemes based on Conditional Random Fields (CRFs) suffer from precision loss due to variable site structures and abundance of noise in Web pages. In this p...

متن کامل

Knowledge Base Augmentation using Tabular Data

Large linked data repositories have been built by leveraging semi-structured data in Wikipedia (e.g., DBpedia) and through extracting information from natural language text (e.g., YAGO). However, the Web contains many other vast sources of linked data, such as structured HTML tables and spreadsheets. Often, the semantics in such tables is hidden, preventing one from extracting triples from them...

متن کامل

A Fuzzy Approach for Pertinent Information Extraction from Web Resources

Recent work in machine learning for information extraction has focused on two distinct sub-problems: the conventional problem of filling template slots from natural language text, and the problem of wrapper induction, learning simple extraction procedures (“wrappers”) for highly structured text such as Web pages. For suitable regular domains, existing wrapper induction algorithms can efficientl...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Attribute Relation Extraction from Template-inconsistent Semi-structured Text by Leveraging Site-level Knowledge

نویسندگان

چکیده

منابع مشابه

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Agricultural Knowledge Discovery from Semi-Structured Text

High-Precision Web Extraction Using Site Knowledge

Knowledge Base Augmentation using Tabular Data

A Fuzzy Approach for Pertinent Information Extraction from Web Resources

عنوان ژورنال:

اشتراک گذاری